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(57) Abstract 

A nucleic acid expression construct com- 
prising: (a) a promoter, (b) an intron whose nat- 
ural position is within the 5 -untranslated region 
of a gene from which it is derived; (c) a coding 
sequence; and (d) a 3-flanking sequence wherein 
the intron (b) is not derived from the same gene 
as mat from which either the promoter (a) or the 
protein-coding sequence (c) is derived and pro- 
cesses, vectors, hosts and uses involving such a 
construct to obtain inter alia an increase in the 
level of expression of the coding sequence. 
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1 

HETEROLOGOUS EXPRESSION OF PROTEINS BY "RESCUED" VECTOR COMPRISING AN INTRON 

This invention relates to the expression of proteins in heterologous host systems, 
particularly in, but not limited to, the mammary gland of transgenic animals. 

5 

It has been shown, using regulatory DNA elements from milk protein genes, that it 
is possible to express heterologous proteins in the milk of transgenic livestock. 
One such gene, that for ovine {J-lactoglobulin (BLG), has been cloned and 
characterised (AH and Clark, J. Mol. Biol., 199 145-426(1988)). The authors 
10 subsequently demonstrated consistent, high level, expression of ovine BLG in the 
milk of mice transgenic for the entire gene (Simons et al., Nature, 328 530- 
532(1987); Harris et al, Developmental Genetics, 12 299-307(1991)). Further 
experiments demonstrated that the BLG promoter region can direct high levels of 
expression of a heterologous human protein to the milk of transgenic mice 

is (Archibald et al., Proc. Natl. Acad. Sci. USA, 87 5178-5182(1990)). The 
generation of sheep, expressing human proteins in their milk using BLG regulatory 
elements, indicated that this technology was applicable to transgenic livestock 
(Simons et al, Bio/Technology, 6 179-183(1988); Clark et al., Bio/Technology, 7 
487-492(1989)). The commercial feasibility of this technology, as a means of 

20 producing recombinant therapeutics in livestock milk, has been confirmed by the 
demonstration of high level expression of human cc.-antitrypsin in the milk of 
transgenic sheep (Wright et al., Bio/Technology. 9 830-834(1991); Carver et al., 
Cytotechnology, 9 77-84(1992); Carver et al., Bio/Technology, 11 1263- ' 
1270(1993); Cooper and Dalrymple, The Japanese Journal of Experimental 

25 Medicine, Developmental Biotechnology supplement, 12(2) 124-132(1994)). 



This high level of expression of a heterologous protein in livestock milk was the 
result of using a fusion of the BLG promoter region to human genomic sequences 
(Wright etal., Bio/Technology, 9 830-834(1991)). Analogous cDNA based 
constructs were poorly expressed in transgenic mice (Whitelaw et al., Transgenic 
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Res., 1 3-13(1991)). Despite some notable exceptions in the field as a whole, 
(Ebert etal. Biotechnology, 9 835-838(1991); Velander etal., Proc. Natl. Acad. 
Sci. USA, 89 12003-12007(1992)) the general inefficient expression of cDNA 
based constructs is well documented (Brinster et al, Proc. Natl. Acad. Sci. USA. 
s 85 836-840(1988); Palmiter et al., Proc. Natl. Acad. Sci. USA 88 478-482 (1991); 
Whitelaw etal., Biochem. J., 286: 31-39 (1992)). Observed problems include the 
influence of chromosomal position effects and distinct spatial and/or temporal 
expression in lines transgenic for the same construct. Such constructs can be 
improved by the addition of some natural or heterologous introns. However, 
io expression levels from such constructs rarely match levels attained with constructs 
containing some or all natural introns in the region encoding a heterologous 
protein. The successful use of less than a full complement of introns is the subject 
of WO-A-9005188. In spite of that useful advance in the art, however, the genetic 
material encoding many potential target human proteins which may be produced by 
is the transgenic mammary gland is very often, due to immediate non-availability or 
the size of the natural gene, limited to cDNAs. As such, a technique giving more 
consistent expression from transgene constructs amtoining intronless cDNA 
sequences is highly desirable. 

A further advance in the expression of cDNAs is the so-called "rescue" 
technology, an approach developed by Clark and co-workers (Clark et al., 
Bio/Technology, 10 1450-1454(1992); WO-A-9211358)) to overcome cDNA- 
related expression problems. It makes use of the observation that co-injection of 
an actively expressed transgene, such as the entire ovine BLG gene, together with 
an intronless construct results in the expression of the second construct where no 
expression is achieved when it is injected alone. Clark and colleagues have 
demonstrated the expression of up to 800^g/ml of human ^-antitrypsin (AAT) in 
the milk of mice transgenic for both BLG and an intronless human AAT construct. 
In mice transgenic for the latter construct alone, only one out of eight mice 
expressed and this at a level of only 3.9jig/ml. Similarly, using this technology, 
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an expression level of lOfyzg/ml from a wild type human protein C cDNA 
construct was achieved (WO-A-9211358). This represents approximately 20% the 
expression level obtained with an equivalent genomic based construct. 
The cDNA construct alone gave no expression in 1 1 lines of transgenic mice. 

5 

The "rescue" .phenomenon has been rationalised as follows. Strongly expressing 
genes have an innate ability to •dominate 1 their chromosomal environment such 
that they are able to initiate and maintain a high expressing state. Intronless genes 
are deficient in some, as yet identified, feature which provides them with this 

10 capability. However, the dominant effect of the strong gene extends some way 5' 
and 3' to the gene itself and therefore by linking a 'weak* and 'strong 1 gene, some 
of the properties of the high expressing gene are conferred on the intronless gene. 
Clark and colleagues propose that this probably results in an open chromatin 
conformation associated with the actively expressing gene which encompasses 

is adjacent intronless genes. The actively expressing gene may thus create a 
permissive domain allowing access to the intronless genes by the transcriptional 
machinery of the cell. In the absence of adjacent actively expressing genes, the 
intronless construct may be inaccessible, probably residing in condensed 
chromatin. Other possible explanations for this phenomenon include enhancer-like 

20 sequences present in the actively expressing gene but absent from the intronless 
construct interacting positively with the latter or simply that the actively expressing 
gene insulates the intronless gene from the negative effects of adjacent chromatin. 

To take advantage of "rescue" technology, we have constructed a vector, pMAD, 
25 from the ovine BLG gene for the cloning of cDNAs (Figures 1 and 2). This vector 
contains the same 5* and 3* flanking sequences present in the BLG gene which 
itself always gives rise to high level expression in transgenic mice. However, it 
lacks all coding sequences and introns of the intact gene. Cloning of cDNAs in the 
unique EcoRV site between 5' and 3' flanking sequences results in constructs 
30 suitable for expression by the "rescue" approach. However, the issue of co- 
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injection of two covalently unlinked genes is not without its difficulties. There is 
always the risk that one gene or other is not represented in the final transgenic 
lines. Additionally, the two different genes may be present but not at the same 
locus. Subsequently they may segregate upon breeding. Finally, the physical 
5 structure of a BLG/pMAD array is not determined prior to injection and there is no 
control over it. The relative copy numbers of the two genes may vary especially if 
the DNA concentrations of the two constructs are not tightly controlled. 

cDNAs have been successfully expressed at high levels, in a limited number of 
10 cases. It is not clear from the literature why this should be the case. However, 
the fact is that a cDNA has never (to our knowledge) been expressed at high levels 
from a BLG construct other than by rescue. 

We had noted the work of Brinster and Palmiter (ibid) and others and we sought to 
is incorporate a BLG intron into our cDNA constructs. To this end the vector 

pMAD6 was constructed, containing almost all the BLG sequence 3* to the natural 

BLG stop codon, i.e. a portion of exon 6, intron 6, all of exon 7 and those 

available sequences downstream of the polyadenylation site (see Figures 1 and 2). 

A protein C cDNA in this vector (pCORP3) expresses at detectable levels but not 
20 nearly as well as "rescued" intronless pCORP2 (see table 2). Thus we can 

conclude that the mere presence of a BLG intron is insufficient to achieve high 

level expression. 

Noting that certain genes have an intron in the 5'-untranslated region (5'-UTR), we 
25 engineered the natural BLG first intron into the 5'-UTR of the BLG sequences in 
pMAD (to give pMADl) and into pMAD6 (to give pMAD16). When protein C 
cDNAs were put into these vectors, there was no detectable expression of protein 
C in the milk of lactating female transgenics (see table 2; pCORPo). This indicates 
that the mere presence of intronic sequences in the 5'-UTR of a gene is in general 
30 insufficient to allow expression of a cDNA. 
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We have now found, however, that if, instead of the BLG first intron, an intron 
whose natural position is within the S'-untranslated region of its gene is used, good 
expression results. 

s According to a first aspect of the invention, there is provided an expression 
construct comprising: 

(a) a promoter; 

10 (b) an intron whose natural position is within the 5*-untranslated region 

of a gene from which it is derived; 



(c) a coding sequence; and 
15 (d) a 3'-flanking sequence, 

wherein the intron (b) is not derived from the same gene as that from which either 
the promoter (a) or the coding sequence (c) is derived, and, in particular wherein 
the promoter (a) drives expression of the coding sequence (c) at a level which is 
2 o elevated by virtue of the presence of the intron (b). 

Elevated levels of expression include expression where previously none was 
measurable (or obtained). Elevated levels is optionally defined as a level higher 
than obtained by the construct without the intron (and optionally the 3' fianking 
25 sequence) described above. 



Preferably the expression construct is a DNA expression construct. Preferably the 
coding sequence is a protein-coding sequence although it may code non-protein 
substances such as ribozymes. The construct is effective for two particular 
reasons; firstly, the promoter (a) drives expression of the coding sequence (c) at a 
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level which is elevated by virtue of the presence of the intron (b) and/or secondly 
the coding sequence (c) is more likely to be expressed in a transgenic host, by 
virtue of the presence of the intron (b). The second effect is particularly important 
when taking into account the length of time and the efforts required to produce 
5 transgenic animals useful as bioreactors for the production of usefiil proteins, etc. 
It is also important in laboratory scale trials to determine and obtain transgenic 
hosts. Use of constructs as described in the claims have shown that an increased 
number of transgenic hosts express the coding sequence over use of the constructs 
without specific intron described herein (e.g see number of expressing founders in 
10 Table 3). The elevated level of expression of the coding sequence and/or the 
expression of the coding sequence may be by virtue of the presence of the intron 
(b) and the 3'- flanking sequence (d). 

The DNA expression construct may be useful for expression in any suitable host 

is system such as, for example, prokaryotes, (e.g. E.coli), fungi, plant and animal 
(including mammalian) cell lines and transgenic plants and animals (including 
mammals). However, it is in transgenic animal hosts that the expression constructs 
of the invention are most useful. In principle, the invention is applicable to all 
animals, including birds such as domestic fowl, amphibian species and fish species. 

20 The protein may be harvested from body fluids (such as milk, blood or urine) or 
other body products (such as eggs, where appropriate). In practice, it will be to 
(non-human) mammals, particularly placental mammals, that the greatest 
commercially useful applicability is presently envisaged. This is because 
expression in the mammary gland, with subsequent optional recovery of the 

25 expression product from the milk, is a proven and preferred technology. It is with 
ungulates, particularly economically important ungulates such as catde, sheep, 
goats, water buffalo, camels and pigs that the invention is likely to be most useful. 
The generation and usefulness of such mammalian transgenic mammary 
expression systems is both generally, and in certain instances specifically, disclosed 

30 in WO-A-8800239 and WO-A-9005188. 
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In this text, the meaning of a sequence being derived from a gene does not require 
that the sequence has actually been obtained from the gene in question. Rather, all 
and any copy, as well as the original sequence is meant. Further, any modification 
5 to the sequence which does not remove the desired end result can be used. 



In addition to being useful in transgenic animal expression for non-therapeutic 
purposes (as far as the host is concerned), constructs of the invention may also be 
useful in genetic therapy in humans or other animals. 

10 

The promoter can be any suitable promoter chosen from a gene different from the 
source of the intron (b). Within that constraint, it will be chosen having regard to 
its desired properties in the construct of the expression system to be used and its 
ability to derive expression of heterologous sequences in cell culture or in a 

15 transgenic organism. A promoter is any sequence which drives expression of a 
coding sequence. For example, the BLG promoter does not express particularly 
highly in cells which do not respond to prolactin (such as COS cells). A 'cell' 
promoter according to the invention is the HCMV (human cytomegalovirus) IE 
gene promoter. Other promoters of the invention include, endothelial promoters 

20 such as vascular cell adhesion molecule (VCAM), platelet endothelial cell adhesion 
molecule- 1 (PECAM), inter-cellular adhesion molecule-2 (ICAM) and smooth 
muscle promoters, such as Desmin E and Desmin P. A preferred promoter is one 
which drives expression of the protein coding sequence in mammalian cells. In 
relation to expression in transgenic animal hosts, the preferred expression system 

25 involves expression in the mammary gland of transgenic placental mammals. For 
this purpose, milk protein promoters will generally be used, preferably but not 
necessarily derived from the species chosen as an expression host. The promoter 
may be a casein promoter (such as an a-, 0- or K-casein promoter), but it is 
preferred that it be a non-casein promoter, such as the human Bile Salt Stimulated 

30 Lipase (BSSL) promoter, more preferably a whey protein promoter, such as that of 
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whey acidic protein (WAP), a-lactalbumin or, most preferred of all, p- 
lactoglobulin. Figure 3 is a schematic representation of the cloning of pCASLAC 
and obtaining transgene constructs therefrom. pCASLAC corresponds to 
pCASMAD6 (see Fig. 2, 4 and 7) only using the more tightly regulated a-lac 
5 promoter, Of course, the present invention covers promoters, in the constructs 
described, which have not yet been isolated or characterized. One general way for 
isolating specific promoters (such as mammalian promoters) for use in the present 
invention is to isolate specific cDNAs by differential display or from subtractive 
cDNA libraries. These, in turn, are used to screen genomic libraries for the 
10 cognate promoters. 

In addition, the present invention encompasses the use of a modified low 
expressing naturally occurring promoter in vitro to an increased level of expression 
(eg. by addition of an enhancer) or to use a promoter with a higher level of 
is expression in a crossed species (eg. the human a-lactalbumin promoter expresses 
better in mice than die endogenous mouse promoter). 

A promoter according to the invention may also be a viral or modified cellular 
promoter or a completely artificial promoter having the properties of high level 
20 expression (preferably mammalian species). Details of suitable promoters can be 
found in Houdibine, J-M., J. Biotech., 34: 269-287 (1994); Garner, I. & 
Dalrymple, M., in "Encyclopedia of Molecular Biology: Fundamentals and 
Applications", Robert A. Myers (Ed.), Weinheim, NY. 

25 Element (b) of a construct of the invention is an intron whose natural position is 
within the 5 f -untranslated region (5'-UTR) of its natural gene (i.e the gene with 
which it is naturally associated). The whole intron is not necessarily required. 
Fragments or portions may be sufficient. The requirement for the present 
invention is that the level of protein expression, from any construct according to 

30 the invention, is elevated by virtue of the presence of the intron, or parts thereof. It 
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has been shown that the first and third portions of an intron (which has been 
divided into three fairly equal parts), recombined, are often effective. Generally 
speaking, the intron for inclusion in a construct according to the invention will be 
the first intron of such a gene. Examples of genes with such known introns 
5 include; human and rat aldolase A, human type II IL-1 receptor, human UDP-N- 
acetylglucdsaminyl transferase, mouse involucrin and mouse adenosine deaminase. 
Some genes have more than one intron whose natural positions are all within the 5* 
untranslated region of its natural gene. The present invention recognises this and 
covers, within element (b), one or more of such introns (for example in a gene 

10 with two introns naturally positioned in the 5* UTR, they may separately, together 
or parts of each cojoined be included in a part of a construct according to the 
invention). These and other yet unidentified introns whose natural position is 
within the 5' untranslated region of its natural gene may be used according to the 
present invention. Also included are: the introns of several gene families 

is including; the actin family (two skeletal muscle actins-alpha cardiac and alpha 
skeletal, two smooth muscle actins-alpha smooth and gamma smooth, and two non- 
muscle actins-beta and gamma cytoplasmic actin), the troponin family (cardiac, 
skeletal and foetal troponins) and the casein family (a SI, a S2, p and k). 
Preferably the intron is the first intron of the family. In the case of transgenic 

20 mammary specific expression, the most preferred gene family from which the 
intron may come is the casein family or the actin family. The intron may 
preferably be from the same source of organism as the promoter and/or the 
expression system which it is in (e.g. mammalian, bovine, ovine, etc.). 

25 DNA expression constructs of the present invention are different from that of 
Barash et al. (Nucl. Acids Res. 24(4) 602-610 (1996)), in that Barash et al.'s 
constructs include p-lactoglobulin intragenic sequences which are not within the 5'- 
untranslated region. Barash et ah do not refer to the possibility of using an intron 
whose natural position is within the 5'-untransIated region of its natural gene. 

30 Caseins, whose genes represent a preferred source of the 5*-UTR introns useful in 
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the present invention, are the major mammalian milk proteins and are encoded by a 
small gene family, which in cows and sheep consists of four members, <x sl , p, 
and k, and in mice and rats five, a, p, y, e and k (Yu-Lee & Rosen, J. Biol 
Chem., 258 10794-10804 (1983); Jones et al, 7. Biol Chem., 206 7042- 
5 7048(1985); Thompson et al, DNA, 4 263-271 (1985)/ reviewed by Mercier & 
Vilotte, 7. Dairy ScL, 76 3079-3098(1993)), The evolution of the calcium 
sensitive caseins (a and p) is believed to have occurred by recruitment of expns 
encoding discrete functional domains, followed by intragenic and intergenic 
duplication to create the present number of similar exons within a given gene, and 
10 of genes within a family (Jones et al 9 J. Biol Chem., 206 7042-7048(1985); 
Groenen etal, Gene, 123 187(1993); reviewed by Mercier & Vilotte, 7. Dairy 
Scl, 76 3079-3098(1993)). There is no evidence that k casein is evolutionally 
related to the other caseins. Both in sequence homology and protein function it 
appears to be related to y fibrinogen (Jolles et al, Biochim. Biophys. Acta., 365 
is 335(1974); Thompson et al, DNA, 4 263-271(1985); Alexander et al, Eur. 7. 
Biochem, 178 395-401(1988)), which performs a cleavage-induced clotting 
function in blood similar to the clotting function of k casein in the stomach. The 
caseins all map to a single chromosome in rodents, sheep, cows, humans and pigs 
(reviewed by Mercier & Villotte, 7. Dairy Sci., 76 3079-3098(1993)), all four 
20 bovine caseins have been mapped to a single 250 Kb locus (Ferretti et al., Nucleic 
Acids Res., 18 6829-6833(1990); Threadgill & Womack, Nucleic Acids Res., 18 
6935-6942(1990)) and all five mouse caseins to a 400 Kbp region (Tomlinson et 
al., Mammalian Genome, 7 542-544). 

25 The first intron of the calcium sensitive casein genes is naturally positioned in the 
5'-UTR, upstream of the start of translation. The position of this intron is 
conserved across species barriers, indicating that there may be some critical 
function for an intron in this position. The intron may be obtained by PCR 
amplification from genomic DNA. The resulting DNA fragment may be cloned 

30 into a suitable site of an appropriate vector, such as the pMAD6 vector described 
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above. 



Constructs of the invention also contain a coding sequence (c), whose expression is 
driven by the promoter (a) under the beneficial influence of the intron (b). The 
5 protein-coding sequence may code for any (natural or modified) protein of interest, 
particularly those which may be advantageously produced in the preferred 
mammary gland expression systems. Examples of classes of such proteins, and 
specific instances within those classes, are as follows: blood proteins involved in 
haemostasis including factors V, VII, VIII, IX, X, XIII, PAI-1, PAI-2, TFPI, 
io protein C (details of protein C according to the present invention can be found for 
example in EP-A-191606 and W097/20043), protein S, alpha 1-antitrypsin (AAT) 
(details of which can be found in general from Perlino et al. EMBO Journal, £, 
2767-2771, 1987 and WO90/05188), tPA, fibrinogen (details for which may be 
found in W095/23868 and the references cited therein); other protease inhibitors 
is such as serpins, Kazal/Kunitz inhibitors, kinninogens, stefins, cystatins or tissue 
inhibitors of metalloproteinases; growth factors; protein hormones; structural 
proteins such as collagens (details of which may be found in WO93/07889, 
WO94/16570, WO97/08311 and the references cited in these publications) and 
keratins; enzymes such as Upases, other proteases and transferases; and antibodies. 
20 While the protein-coding sequence may in principle be any suitable sequence, such 
as either the full natural genomic structure, a minigene sequence consisting of 
some, but not all, of the introns naturally present in the gene, or a cDNA 
(containing no introns), it will generally be with cDNA sequences that the 
invention is most useful. This is because the invention may conveniently enable 
25 the expression of protein from cDNAs which may otherwise only be achievable 
using minigenes or foil genomic sequences. Furthermore, some proteins may be 
expressed in nature from intronless genes (e.g. bacterial or yeast genes, human 
thrombomodulin) or have natural intron structures incompatible with the chosen 
host (e.g. invertebrate or plant genes in a mammalian cell). In these cases the 
30 'cDNA' route is the only one available. 
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The intron (b) is preferably positioned upstream of the translation start site for the 
protein-coding sequence (c), by analogy with its position in its natural 
environment. 

5 

Particularly preferred constructs according to the present invention are the BLG 
promoter with either (i) the first intron from bovine p-casein or (ii) the first intron - 
from muscle cardiac actin or (iii) the first intron from ovine p-casein. More 
preferably, the 3' flanking sequence is from BLG as described below for preferred 
10 3' sequences under (i). Particularly preferred constucts of the present invention 
include the following: 

(i) BLG promoter + bovine p-casein intron 1 + BLG 3' sequence 
(particularly the 3* sequence beginning immediately 3* to the natural p- 

is lactoglobulin stop codon and continuing to at least about 30 bases 3' of the 

poly-A site), optionally including ovine beta-lactoglobulin intron 6 
(preferred positioned 5' to the flanking sequence and 3* to any coding 
sequence); in particular the construct pCASMAD6 as described in Fig. 2, 4 
or 7; 

20 

(ii) BLG promoter + muscle cardiac actin intron 1 + BLG 3' sequence 
(particularly the 3' sequence beginning immediately 3' to the natural P- 
lactoglobulin stop codon and continuing to at least about 30 bases 3' of the 
poly-A site), optionally including ovine beta-lactogloulin intron 6 (preferred 

25 position 5' to the flanking sequence and 3' to any coding sequence); in 

particular the construct pACTMAD6 as described in Fig. 2 t 5 or 7; 

(iii) BLG promoter + ovine p-casein intron 1 + ovine p-casein 3' flanking 
sequence; in particular the construct pBOB as described in Fig 2, 6 or 7. 

30 
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Preferably, following the coding sequence will be such 3'-sequences as may be 
necessary or appropriate. In the invention at its broadest, it is not thought that the 
nature of such 3'-sequences is particularly limited. The 3'- flanking sequence may 
or may not include its natural intron. Suitable 3' flanking sequences preferably 
s comprise functional elements which are able to direct the correct transcription, 
termination and 3' end processing. These can be determined, without undue 
burden, by the person skilled in the art. However, certain 3'-flanking sequences 
have been found to be particularly useful. These include, but are not restricted, to: 
(i) a poly-A site (poly A addition site), (ii) a p-lactoglobulin gene 3'-sequence 

10 beginning immediately 3' to the natural p-lactoglobulin stop codon and continuing 
to at least about 30 bases 3' of the poly-A site (as found in pMAD6 and 
pCASMAD6 and PACTMAD6), or (iii) p-casein 3' sequences including poly A 
signal. These sequences (as used in pBOB) consist of 6.5Kbp of DNA 
incorporating ovine p-casein exons 7 to 9, introns 7 and 8, and approximately 

is 4.8Kbp of 3' sequence. 

The presence of such 3'-sequences in the construct adds stability to it. It is 
believed that the relative orientation of the first and last intron may contribute to 
this stability. 

20 

The p-lactoglobulin gene 3*-sequence may be cloned from a P-lactoglobulin gene 
or amplified by PCR and cloned from genomic DNA, which may be of ovine 
origin. As mentioned above, it begins with the natural p-lactoglobulin gene 
sequence immediately 3' to the stop codon, which is a TAG codon occurring in 

25 exon VI. It extends to at least about 30 bases 3' of the poly-A site, which is in 
exon VII. Exons VI and VII bracket intron 6 which is present in its entirety. The 
preferred minimum length of the P-lactoglobulin-derived 3'-sequences is about 2.3 
Kb. For additional preference at least about 50 bases 3' to the poly A site are 
present. Similarly, the P-casein 3' sequences may be cloned from the p-casein 

3 o gene or amplified by PCR and cloned from genomic DNA, which may be of ovine 
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origin. 

Appropriate signal and/or secretory sequences, operably linked to the construct 
may be present if necessary or desirable. 

5 

In other aspects, the invention is directed to: 

• a process for the preparation of a construct according to any feature of the 
first aspect. The process comprises linking together selected nucleotide 

10 bases and/or nucleotide sequences; 

• a vector comprising a construct according to the first aspect of the 
invention. The vector may be plasmid, phage, cosmid or other vector type, 
for example derived from yeast. The vector may be an expression vector; 

15 

• a process for the preparation of a vector described above, comprising 
introduction of a construct according to the first aspect of the invention into 
a vector construct; 

20 • a process for the preparation of a host (preferably an expression host), the 
process comprising introducing a DNA expression construct (or a vector), 
as described above, into a suitable organism; the process, in particular 
provides a host which expresses elevated levels of the coding sequence in 
the construct; 

25 

• a host organism (preferably an expression host organism) incorporating a 
DNA expression construct (or a vector) as described above (and preferably 
capable of giving rise to expression of protein encoded by the construct, 
although non-expressing hosts such as Escherichia coli and other 

30 procaryotes may be useful as cloning hosts); The host may be a eukaryotic 
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or prokaryotic cell/organism, such as bacteria, insect or yeast cells, as well 
as animal tissues (ceils in culture) and animals themselves. Such animals 
are transgenic and preferred transgenic animals include mammals, in 
particular non-human placental mammals such as pigs, sheep, cattle and 
goats. Preferably the host (e.g. transgenic animal) according to the 
invention has the construct (of the first aspect of the invention) integrated 
into its genome. It is particularly preferred that the transgenic animal 
transmits the construct to its progeny, thereby enabling the production of at 
least one subsequent generation of producer animals. Such a host 
organism, in particular expresses elevated levels of the coding sequence in 
the construct; 



a process of preparing a protein, the process comprising allowing an 
expression host to express a DNA expression construct as described above, 
and optionally subsequently purifying the protein; 

a protein when prepared by such a process. The protein may be a fusion 
protein; 

the use of a nucleic acid expression construct comprising a promoter, an 
intron whose natural position is within the 5'-untranslated region of a gene 
from which it is derived, a coding sequence and a 3' flanking sequence to 
obtain a transgenic host, preferably with elevated levels of the expressed 
coding sequence; 

the use of a nucleic acid construct comprising a promoter, an intron whose 
natural position is within the 5' untranslated region of a gene from which it 
is derived, a coding sequence and a 3' flanking sequence to increase the 
likelihood of expression of the coding sequence from a transgenic host 
which incorporates the nucleic acid construct; 
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10 



a process for improving whether an individual or a number of transgenic 
hosts express a transgene coding sequence, the process comprising 
introducing into a host, a nucleic acid construct comprising a promoter, an 
intron whose natural position is within the 5' untranslated region of a gene 
from which it is derived, a coding sequence and a 3' flanking sequence. 

In addition to the construct according to the first aspect of the invention, there is 
provided an empty •cassette', including all features in claim 1, without the coding 
sequence. Such a "cassette" provides an easy means by which to provide a high 
expressing vector for other parties to use by simply introducing coding sequences 
of interest by restriction endonuclease cutting of the empty cassette and religation 
(according to standard techniques). The empty "cassette" is for use with an 
incorporated coding sequence of interest. 

is Preferred features for each aspect of the invention are as for each other aspect 
mutatis mutandis. 



The present invention also provides, as a separate aspect, the novel expression of 
collagen cDNA (natural procollagen chains or modified collagen). Preferably the 
collagen cDNA is expressed via a construct according to the first aspect of the 
invention. Preferred features of and all different aspects of the invention described 
herein above in relation to the construct, also apply to the expression of collagen 
cDNA. Particular preferred details in relation to collagen are described above 
under a discussion of the protein-coding sequences, including references thereto. 
For example, for the expression of all collagen DNA (cDNA or otherwise), 
expression hosts may co-express prolyl 4-hydroxylase, which is a post-translational 
enzyme important in the natural biosynthesis of procollagen. 



20 



25 



30 



The invention will now be illustrated by the following examples. The examples 
refer to the drawings, in which: 
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FIGURE 1 shows the (J-lactoglobulin (BLG) sequences and the plasmids 
pMAD and pMAD6. 

FIGURE 2 shows the origin of sequences present in plasmids pMAD, 
pMAD6, pCASMAD6 and pACTMAD6. 

FIGURE 3 shows the construction of pCASLAC 

FIGURE 4 shows the construction of pCASMAD6. 

FIGURE 5 shows the construction of pACTMAD6. 



FIGURE 6 shows the construction of pBOB. 

is FIGURE 7 shows details of pMAD, pMAD6, pCASMAD6, pACTMAD6 

and pBOB. 

Preferred embodiments of the invention are based on the use of the BLG promoter, 
and are designed to express cDNAs from the BLG gene. The structure of pMAD6 
20 is indicated in Figures 1, 2 and 7. This vector contains the same 5' and 3' flanking 
sequences present in the ovine BLG gene which itself always gives rise to high 
level expression in transgenic mice. 

However, it lacks all protein coding sequences and introns 1 to 5 of the intact gene. 

25 The 3' non coding exons of the gene remain in this vector together with the final 
intron of the BLG gene. Cloning of cDNAs in the unique EcoRV site between 5' 
and 3* flanking sequences results in constructs suitable for expression of cDNAs. 
Incorporation of the BLG 3" sequences are not essential for the invention. Such 
BLG 3' sequences can be substituted by any competent 3' flanking sequences with 

30 or without an intron situated downstream of the last (stop) codon of such a gene. 
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In outline, the first intron of the bovine 0-casein gene was amplified by PCR from 
genomic DNA. The resulting DNA fragment, of approximately 2 Kbp, was cloned 
and subsequendy subcloned into the EcoRV site of pMAD6 in such a way that the 
s original EcoRV site was destroyed and reformed on the 3* side of the intron. This 
gave the vector pCASMAD6 (Figures 1, 2, 4 and 7). A cDNA encoding human 
protein C was inserted into the unique EcoRV site of pCASMAD6 and the new 
construct called pCORI69. The cDNA utilised encodes a mutant form of the 
natural protein C (PC962): the mutation was designed to allow more efficient 
lo processing of the mature protein (Foster et al. Biochemistry, 29:347-354 (1990)). 

This mutant form of the human protein C cDNA has been incorporated into a 
construct pCORP9, exacdy analogous to pCORP2 (see WO-A-9211358). In 
"rescue" experiments pCORP9 expressed particularly poorly, the highest 
5 expressing line being 3/ig/ml compared to 108/ig/ml for the wild type cDNA. This 
indicates that this mutant cDNA is particularly difficult to express at high levels 
and therefore is a very exacting test of any cDNA expression system. 
All references to the DNA sequence of the p-Iactoglobulin gene utilise the 
numbering of the sequence allocated EMBL Accession No. X12817 (Harris et al. , 
NAR 16: 10379-80 91988). 

EXAMPLE 

General 

Where not specifically detailed, recombinant DNA and moleuclar biological 
procedures were after Maniatis sLal ("Molecular Cloning" Cold Spring Harbor 
(1982)) "Recombinant DNA" Methods in EnTvmningy Volume 68, (edited by R. 
Wu), Academic Press (1979); "Recombinant DNA part B" Methods i,, 
EozymolQgy. Volume 100, (Wu, Grossman and Moldgave, Eds), Academic Press 
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(1983); "Recombinant DNA part C" Mafaodsjn Enzvmolngy Volume 101, (Wu, 
Grossman and Moldgave, Eds), Academic Press (1983); and "Guide to Molecular 
Cloning Techniques", Methods in Enzvmnlnpy Volume 152 (edited by S.L. Berger 
& A.R. Kimmel), Academic Press (1987). Unless specifically stated, all chemicals 
s were purchased from BDH Chemicals Ltd, Poole, Dorset, England or the Sigma 
Chemical Company, Poole, Dorset, England. Unless specifically stated all DNA 
modifying enzymes and restriction endonucleases were purchased from BCL, 
Boehringer Mannheim House, Bell Lane, Lewes, East Sussex BN7 1LG, UK. 

io [Abbreviations: bp = base pairs; kb - Kilobase pairs, AAT =alphal-antitrypsin; 
BLG = beta-lactoglobulin; FIX = factor IX; E. coli = Escherichia coli; dNTPs 
= deoxyribonucleotide triphosphates; restriction enonucleases are abbreviated thus 
e.g. BamHI; the addition of -O after a site for a restriction endonuclease e.g. 
PvuII-0 indicates that the recognition site has been destroyed]. 

15 

Construction, of Pfasmids 
Vectors 

20 Plasmid pTJCPM 

The multiple cloning site of the vector pUC18 ( Yanisch-Perron et al. , (1985) Gene 
33: 103-119) was removed and replaced with a synthetic, double stranded, 
oligonucleotide containing the new restriction sites: PvuUMluUSalUEcdRY/XbaV 
PvuV MM, and flanked by 5'-overhangs compatible with the restriction sites 

25 EcoW and Hindm. pUC18 DNA was cleaved with both EcoRl and HinSSL and 
the new linker DNA was ligated into pUC18. The DNA sequence across the new 
multiple cloning site was confirmed. This new vector was called pUCPM. 



Plasmid plinxs 

30 The p-lactoglobulin gene sequences from plasmid pSSltgXS (see WO-A-9201358) 
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were excised on a SaWXbal fragment and recloned into the vector pUCPM, cut 
with Sail and Xbal, to give plasmid pUCXS. 

Plasmid pIICXS/RV 

s The plasmid pSSltgSE (see WO-A-8800239) contains: p-lactoglobulin gene 
sequences from the Sphl site at position 754 to the EcoBI site at 2050, a region 
spanning a unique Noil site at position 1148. This insert contains a single Pvull 
site (832) which lies in the 5'-untranslated region of the p-lactoglobulin mRNA. 
Into this site was blunt-end ligated a double stranded, 8bp, DNA linker encoding 

10 the recognition site for the enzyme EcoRV, to give the plasmid pSSltgSE/RV. The 
DNA sequences bounded by Sphl and Notl were then excised and used to replace 
the equivalent fragment in the plasmid pUCXS, thus effectively introducing a 
unique EcoRV site into the p-lactoglobulin gene placed in such a way as to allow 
the insertion of any additional DNA sequences under the control of the p- 

15 lactoglobulin gene promoter and 3' to the initiation of transcription. The resulting 
plasmid was called pUCXS/RV. 



Plasmid pUCSV 

A derivative of pUCXS/RV, containing only the 4.3 Kbp of the p-lactoglobulin 
20 gene which lie 5' to the transcription initiation site (the promoter), was constructed 
by subcloning the Sall-EcoRV fragment into pUCPM; this plasmid is called 
pUCSV. 

Plasm id oBLACIOQ 

25 A fragment of the 3' flanking sequence of the P-lactoglobulin gene was subcloned 
in such a way as to eliminate all introns. Plasmid DNA of pUCXS/RV was 
partially digested with Smal by performing an enzyme titration with lower and 
lower concentrations of enzyme at a fixed DNA concentration. The Smal protein 
was removed by phenol-chloroform extraction and ethanol precipitation and the 

30 DNA resuspended in water. This DNA was subsequently digested to completion 
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with the enzyme Xbal. DNA cut once at the Smal site, position 5286 and then 
cleaved with Xbal gave a characteristic band of size 2.1 Kbp. This band was 
purified from an agarose gel slice and ligated into Smal and Xbal cut pBSIISK+ 
(Stratagene Ltd., Cambridge Science Park, Cambridge, UK) to give the plasmid 
5 pBLAClOO. 

Plasmid pMAD 

The p-lactoglobulin cloning vector pMAD was constructed to allow rapid insertion 
of cDNAs under the control of the p-lactoglobulin gene promoter and S'-flanking 
io sequences. Such constructs contain no introns. The plasmid pBLAClOO was 
opened by digestion with both EcoRV and Sail, the vector fragment was gel 
purified. Into this was ligated the 4.3 Kbp promoter fragment from the plasmid 
pUCSV as a Sall-EcoRV fragment. This construct is termed pSTl and constitutes 
a p-lactoglobulin mini-gene encoding the 4.3 Kbp promoter and 2.1 Kbp of 3' 
flanking sequences. A unique EcoRV site is present to allow blunt-end cloning of 
any additional DNA sequences. In order to allow excision of novel P-lactoglobulin 
gene constructs with the enzyme MM the entire mini-gene from pSTl was excised 
on a Xhol-Notl fragment, the DNA termini made flush with Klenow polymerase, 
under standard conditions, and blunt-end cloned into the EcoRV site of pUCPM to 
give pMAD. 

Plasmid P MAD6 

Previously described in WO 95/23868, and shown in Figures 1 and 2. 
Plasmid pMADI 

Two primers, complementary to sequences at the 5* and 3' boundaries of the first 
intron of the ovine BLG gene, were used to amplify a ~650bp fragment 
encompassing the entire sequence of intron 1 of the BLG gene from pUCXSRV 
template. The primers introduce a 5' Smal site and a 3' £coRV site at the ends of 
the PCR fragment. This fragment was cloned in Eco RV digested pBluescriptSK 
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to which single 3 1 dATP overhangs were added, using Taq polymerase. This 
construct was named pSTIl. The orientation of the insert with respect of the 
multiple cloning site in pSTIl was determined by restriction digestion. 

s The intron 1 sequence was excised from pSTIl on a 5' Smal -3'Hindm fragment, 
the recessed 3 1 terminus generated at the Hindm end was repaired using Klenow, 
and the resulting blunt ended fragment was ligated with EcoRV digested pMAD to 
make pMADl. The correct orientation of the intron fragment with respect to the 
remainder of the BLG sequences was determined by DNA sequencing. This step 

o effectively moves the EcoRV site to the 3' end of the BLG intron. 

Plasmid pMA PI 

This was constructed using essentially the same strategy as that described for 
pMADl, except that in the final cloning step the BLG intron was ligated with 
; EcoRV cleaved pMAD6 (instead of pMAD) to construct pMAD16. 

Expression Tnn^^rc 



Plasmjd pCORP?. (see WO-A-9211358) 



A 1450bp cDNA of the human protein C gene, flanked by Kpnl sites, was obtained 
in the form of plasmid pWAPC2. The cDNA was excised as a Kpnl fragment, the 
3* overhangs made flush by treatment with T4 DNA polymerase, the fragment gel 
purified and blunt-end cloned into the EcoRV site of pMAD. Orientation was 
determined by restriction digest and confirmed by DNA sequencing. This 
construct is plasmid P CORP2 and contains the human protein C cDNA under the 
transcriptional control of the p-lactoglobulin gene 5' and 3' flanking sequences. 
There are no introns. 
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The 1450bp protein C cDNA fragment used in the construction of pCORP2 was 
placed into pMAD16 to make pCORP5. 

s Plasmid pCORP9 

To facilitate the cloning of the protein C cDNA, PC962 (Foster et al ., ibid), 
into pMAD, the plasmid was modified to incorporate EcoRV sites at the 
extremities of the protein C cDNA insert. A 769 bp Sstll-Pstl fragment 

io encompassing the 3' end of PC962 was cloned between the Sstll and Pstl sites of 
pBluescript II SK+ (Stratagene, La Jolla, CA). The fragment was excised with 
SstJl/EcoRV and purified. The 5' portion of PC962 was modified by PCR. The 
sense oligonucleotide primer for this reaction covered the 5' ATG region of the 
cDNA and provided an EcoRV site upstream of this in the product. The 

15 antisense oligonucleotide primer covered the Sstll site used to generate the &/II- 
EcoRV fragment. The resulting PCR product was digested with EcoRV and 
Sstll and ligated with the SjrII-£coRV 3* fragment and £a?RV digested pMAD. 
The resulting plasmid, designated pCORP9 effectively contained the PC962 
cDNA flanked by EcoRV sites in an intronless fusion driven by the 0- 

20 lactoglobulin promoter. 

Plasmid pCORPU 

A genomic DNA construct, containing exons I through VIII of the human protein 
25 C gene, was made. This genomic construct, designated GPC10-1, changed the 
sequence 16 base pairs upstream of the ATG from the native protein C sequence 
to the P-lactoglobulin sequence and introduced mutations in the propeptide 
cleavage site located in exon 2, and the two-chain cleavage site located in exon 6, 
as described below. The construct was assembled using four fragments 
30 designated A, B, C and D and encompassed the protein C gene sequence from 
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the ATG to a BornEl site in exon VIII, immediately upstream of the stop codon. 
The fragments were generated from a human genomic library in Charon 4A 
phage which was screened with a radiolabeled cDNA probe for human protein C. 
The screening of the X library produced three clones that together mapped the 
5 entire protein C5 gene (Foster etal ., 1985, Proc. Natl. Acad. Sci. USA, 82: 
4673-4677). These clones were designated PC XI, PC X6 and PC X8. Fragment 
A was a Notl to £c<?RI fragment that contained exons I and II of the genomic 
sequence and was 1698 bp. A subclone of PC X6 contained an EcoRI to £o?RI 
fragment and was designated pHCR4.4-l . Using pHCR4.4-l as a template and 
10 oligonucleotides ZC6303 (5'-ATT TGC GGC CGC CTG CAG CCA TGT GGC 
AGC TCA CAA GCC TCC TGC-3') and ZC6337 (5'-CAG GAA GGA GTT 
GGC GCG CTT GCG CCG TTG CAG CAC CTG CTG GGC-3", a DNA 
fragment was generated by polymerase chain reaction (PCR). Oligonucleotide 
ZC6303 changed the sequence 16 based pairs 5' to the ATG sequence from the 
native protein C sequence to the equivalent sequence from the p-lactoglobulin 
gene and introduced a Notl site. 

Oligonucleotide ZC6337 changed the propetide cleavage site from Arg-He-Arg- 
Lys-Arg to Gln-Arg-Arg-Lys-Arg. The resulting PCR generated fragment was 
digested with Notl and BssHYl, and a 1402 base pair fragment was gel purified 
and designated Al . A second fragment was prepared using a X gtll clone of PC 
XI as a template with oligonucleotides ZC6306 (S'-CTT CTT CCT GAA TTC 
TGT TTC TTG C-3') and ZC6338 (5' -CGG ATC CGC AAG CGC GCC AAC 
TCC TTC C-3*) in a polymerase chain reaction. The resulting DNA fragment, 
designated A3, was digested with BssfflL and £coRI and gel purified, resulting in 
a 296 base pair fragment. 

Fragments Al and A3 were ligated into the Bluescript II KS+ phagemid vector 
(Stratagene. La Jolla, CA). The resulting plasmid, designated GPC 2-2, was 
digested with Notl and EcdRL, gel purified and the Notl-Eco¥l DNA fragment 
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was designated Fragment A. 



pCR 2-14 is a subclone which contains an £coRI to £a>RI DNA fragment of PC 
A* (Foster et al ., 1985, ibid.)- The plasmid was digested with EcoRl and Sstl 
and gel purified. The resulting frgment was designated Fragment B. 



Plasmid pCR 2-14 was used as a template DNA with oligonucleotides ZC6373 
(5' -AAA GTA AAA AAA GAT CTA AAA ATT TAA C-3') and ZC6305 (5' - 
GTG TCT CGT TTT CTT AAG TGA CTG CGC-3'), which introduced an Afltt 
io site and the RRKR mutation of the native (KR) two-chain cleavage site, in a 
polymerase chain reaction. The resulting PCR-generated fragment was digested 
with BgKl and Aflll and gel purified, resulting in a 1441 base pair fragment, 
designated El. Fragment EI was used in a ligation reaction with 
oligonucleotides ZC6302 (5' -TTA AGA AGA AAA CGA GAC ACA GAA 
is GAC CAA GAA GAC CAA GTA GAT CCG C-3') and ZC6304 (5' -GGA 

TCT ACT TGG TCT TCT TGG TCT GTG TCT CGT TTT CTT C-3 1 ). These 
oligonucleotides form AflO. and Sstll restriction sites when annealed and were 
ligated to the 3* end of fragment El, resulting in a fragment with a 5' BgFLl site 
and a 3* SstU site. This frament was used in a ligation reaction with a BamBl- 
20 digested Bluescript II KS + phagemid vector (Stratagene). The resulting 

plasmid was designated GPC 8-5 and digested with Sstl and SstU, generating a 
626 base pair fragment, designated Fragment C. 



A fourth fragment was generated by digestion of a genomic subclone (pHCB7-l) 
of PC X8. P HCB7-1 contained a BglU to BglU fragment that encompassed 
exons VI through VHI. pHCB7-l was digested with SstU and BamUl and a 2702 
base pair fragment was gel purified. The fragment was designated Fragment D. 

A five-part ligation reaction was prepared using Notl and BanMl digested and 
linearized Bluescript n KS+ phagemid vector (Stratagene) with Fragment A (5* 
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Notl 10 y EcoRI) that contained exons I and II, Fragment B (5' EcoM to 3'Sstl) 
that contained exons III, IV and V, Fragment C (5' Sstl to 3' SstlT) that contained 
the 5' portion of exon VI and Fragment D (5' Sstll to BamHI) that contained the 
remaining 3' portion of exon VI and exons VII and VOL 

5 

The resulting DNA was 8950 base pairs and designated GPC 10-1. 

GPC 10-1 was originally generated with BLG sequences and a Notl site upstream 
of the ATG initiator codon and modifications to both cleavage sites. A clone, 

10 designated pPC12/BS, was generated to ensure that the 5' Notl site of GPC10-1 
would not introduce secondary structure into inRNA molecules that could hinder 
translation. pPC12/BS was generated using PCR amplification of a 1 kb Notl- 
Seal fragment that covered the 5' region of the protein C gene and contained the 
wild-type ATG codon environment. This introduced an EcoRV site immediately 

is downstream of the Notl site, adjacent to the ATG codon, and a BamHI site was 
incorporated 3' of the Seal site to facilitate cloning. Following a Notl/BamUl 
digestion, the PCR product was cloned into Notl/BamHl digested Bluescript II 
KS+ phagemid vector (Strategene). The Notl-EcoRV-Scal fragment present in 
pPC12/BS was excised, purified and ligated to GPC10-1, which had been 

20 linearized with Notl and partially digested with Seal (the pUC amplillicin gene 
has an internal Seal site). The resulting clone was designated GPC 10-2 and 
possesses an EcoRV site immedately upstream of the ATG initiator codon. 
GPC10-1 and GPC10-2 both terminated at the final BamHI site in exon VIII of 
the protein C gene. To reconstitute the 56 bp of sequence, ending at the 

25 termination codon, two oligonucleotides were synthesized with flanking BamHI 
(5') and Bglll (3*) restriction sites. Following annealing of the oligonucleotides, 
the product was cloned into BamHI digested pBST+ to generate plasmid pPC3\ 
pBST+ is a derivative of pBS (Stratagene) with a new polylinker. The addition 
of the polylinker added Bglll, Xhol, Narl and Clal restriction sites from the 

so vector polylinker downstream of the destroyed Bglll site of the oligonucleotide 
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construct. 



The Notl-BamEl fragment of GPC10-1 was subcloned into Notl/BamHl digested 
pPC3' to add 3' coding sequences of protein C, the TAG termination codon 
5 followed by Bgm-Xhol-Nart-Oal. The 3' region of the protein C gene 
beginning with the EcoRV site in intron V was excised from this plasmid on an 
EcoRV-Clal fragment. 

The FcoRV-EcoRV fragment from GPC10-2, covering the 5' portion of the 
10 protein C gene, and the 5 above EcoRl-Clal fragment covering the 3' portion of 
the protein C gene were combined between the EcoRV and CM sites of pMAD6 
to generate pCORP13. This effectively placed a genomic portion of the protein 
C gene with modified propeptide and two-chain cleavage site under the control 
of the p-lactoglobulin promoter. 

15 

A farther genomic construct was generated from pCORP13 which contained only 
the modified two-chain cleavage site. This was achived using PCR amplification 
to modify two fragments which result in restoration of the coding capabilitiy of 
exon 2 from the mutant Gln-Arg-Arg-Lys-Arg to the wild-type Arg-Ile-Arg-Lys- 

20 Arg. pCORP13 was used as template for these reaction. The first fragment was 
1.3kb, which encompassed the 5' end of the protein C gene up to the BamHl site 
in excn 2. For this reason, the sense primer was designed to add a Hiruffil site 
5* to the EcoRV site proximal to the ATG initiation codon. The antisense primer 
was designed to restore the wild-type sequences in exon 2, which included a 

25 restored BamHl site. A second fragment of 0.2kb from the BamHl site in exon 2 
to the Xhol site in intron 2, was amplified. The two fragments were combined in 
pGEMO (Promega, Madison, WI) to generate pGEMOC1.5. A 7.5kb Xhol 
fragment from pCORP13 was ligated to Xhol digested pGEMPC1.5 to generate a 
complete protein C genomic sequence covering exons 1-8 with a wild-type 

30 propeptide cleavage site and a modified two-chain cleavage site. The plasmid 
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was designated pGEMPCH. The sequence was excised from pGEMPC14 as a 
HindllllSaa fragment. The DNA termini was repaired using a Klenow reaction 
and the fragment was blunt-end ligated into EcoRV digested pMAD6 to generate 
pCORP14. 

Plasmid pCQRPlfj 

The modified protein C cDNA (PC962) was excited from the plasmid pCORP9 
(see above) as an EcoRV fragment and ligated with EcoRV pMAD6. The 
resulting construct has been named pCORP16. 

The Vector pCASMABS 

Plasmid pF.'in 

is The Bovine p-Casein intron 1 (BBCI 1; BOVCAS1 (5'-AGG CCT ATT CAG 
CTC CTC CTT CAC TTC TT-3') and BOVCAS2 (5'-GAT ATC GGC TCT CAA 
TTC CTG GGA ATG GG-3') approximately 2 Kbp) was PCR amplified from 
dairy cow DNA. The 5' primer incorporates a Stul site and the 3' primer 
incorporates an EcoRV site. The purified 2 Kb fragment was cloned into the 

20 pGEM-T vector (Promega) to give construct pE' 10. 

Plasmid pCASMArvf 

P MAD6 was modified by inserting a linker, containing Spe Wot I/Sac H sites, into 
the EcoRV site. Both orientations of the linker were obtained and thus two new 
25 cloning vectors were obtained. These were called pMAD6/STOPS (5* 
SacWNotUSpel 3') and pMAD6/SPOTS (5' SpeVNotVSacVL 3'). 

BBCI 1 was excised from pE' 10 on a Sacll and Spel partial (due to an internal Spel 
site in the p-Casein intron) and cloned into SacWSpel digested pMAD6/SPOTS. 
30 The new vector was called pCASMADd. 
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Plasmid pCORIffl 

The modified protein C cDNA (PC962) was excised from the plasmid pCORP9 
(see above) as an EcoRV fragment and ligated with EcoRV digested pCASMAD6. 
This places the AUG translation start downstream with respect to the p-casein 
5 intron sequence. The resulting construct was named pCOR169. 

The Vector pACTMADfi 
Plasmid pGEM-AI 

10 Two primers, (Sequences ACTP1 5'-AGG CCT AGT GCC TGC CAC CAG 
CGC CAG CC-3' ACTP2 5"-GAT ATC CCT GGC ACA GCT TTG TGT GGT 
TC-3') complementary to the opposing strands of the 3' end of the first exon and 
the 5' end of the second exon of the murine cardiac actin gene respectively, were 
used in a PCR reaction to amplify a 0.8 Kb fragment encompassing the intervening 

is sequences from a template of mouse genomic DNA. The two primers introduced a 
5* SnaBl and a 3* EcoRV restriction site at the ends of the PCR product. This 
DNA fragment was cloned in pGEM-T to give a construct which was named 
pGEM-AI- DNA sequence analysis confirmed that the sequence of the amplified 
product beyond the primers matched that published for the murine beta actin gene. 

20 

Plasmid pAfTflMADfi 

The actin intron 1 sequence was excised from pGEM-AI on a 5' SnaBl- 3'EcoRV 
fragment which was then ligated with EcoRV digested pMAD6 to give vector 
pACTMAD6. This cloning step effectively moves the EcoRV site from the 3' end 
25 of the BLG promoter downstream to the 3 ' end of the actin gene intron segment. 

Plasmid pCORT70 

The modified protein C cDNA (PC962) was excised from the plasmid pCORP9 as 
an EcoRV fragment and ligated with EcoRV digested pACTMAD6. This places 
30 the AUG translation start downstream with respect to the actin intron sequence. 
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The resulting construct was named pCOR170. 
The Vector pBQP 
5 Plasmid pBOB 

PCR primers were designed to amplify the region of the ovine p-casein gene from 
exon 1 to exon 2 (BOB1: 5*-CGG GAT CCG TCG ACC ATT CAG CTT CTC 
CTT CAC TTC TTC TC-3'; BOB2: 5'-CGG GAT CCG GGT CCC TAC GTA 
GGC TCT CGA TTC CTG TGA ATG GGA-3'). The size of this product is 
10 2.1Kbp and has, engineered into it, the sites BammiSatl at the 5' end and 
BamEl/PpurnVSnaBl at the 3' end. 

The construction of pBOB toook place in three steps: the 2.1Kbp PCR fragment 
(above) was blunt-end cloned into the EcoRV site of pBSIISK+ (Stratagene) to 
is give the plasmid pop- casExl/2. The 6.5Kbp Ppuml fragment from the ovine 0- 
casein gene, containing exons 7 to 9 and 3* flanking sequences, was cloned into the 
now unique Ppuml site of pOp- caseExl/2. A clone was obtained with the 6.5Kbp 
fragment in its natural orientation with respect to the first intron and this clone was 
named pBOBAprom. 

20 

Previously, a Xhol linker had been cloned into the £ooRV site of pMAD6 and the 
modified plasmid named pMADX. Finally, the ovine BLG promoter from 
pMAD6X was cloned into the Sad site of pBOBAprom as a SalUXhol fragment 
giving rise to pBOB. 

25 

Plasmid P CORB71 

The modified protein C cDNA (PC962) was excised from the plasmid pCORP9 as 
an £c©RV fragment and ligated with EcoRV digested pBOB. This places the 
AUG translation start downstream with respect to the actin intron sequence. The 
3 o resulting construct was named pCORB7 1 . 
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Example 1 

Analysis of constructs of the present invention in the expression of protein C in 
transgenic animals. Results are shown in Tables 1 and 2. 

5 Generation of transfer animals 

Transgenic mice were generated as described in Prunkard et al (Nature 
Biotechnology, 14:867-871, 1996). 

Protein C Assay s 

o Human protein C in the milk of transgenic animals was assayed according to the 
following procedure: 



Protein C Standard 

Purified human Protein C stored at SO^g/ml in Phosphate Buffered Saline 
15 (PBS)/1 % bovine Serum Albumin (BSA) at 20 0 C. Dilute to 500ng/ml in blocking 
buffer for use. Standard curve range of ELISA is 3.9-125ng/ml. 

Blocking buffer 
IX PBS 
20 5% Milk powder 
0.01%Tween20 



Wash buffer 
IX PBS 
25 0.05%Tween20 



Coating Antibody 

Dako Rabbit Anti-human Protein C antibody diluted to lO^g/ml in PBS 
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Detection Antibody 

Dako Rabbit Anti-human Protein C antibody Peroxidase conjugate. 
Dilute 1/5000 in blocking buffer. 



5 Substrate 

TMB 1 Component Peroxidase substrate 



Stop Solution 

0.2M Sulphuric acid 



10 



Method 

1 . Coat Costar High Binding capacity 96 well plate with 150pl/well of coating 
antibody. Incubate in a damp box o/n in fridge. 

2. Wash wells with wash buffer (3X 200p.l/well). 
is 3. Load lOOpl blocking buffer into wells. 

4. Dilute samples appropriately in blocking buffer. 

Reference human plasma is used as a protective control at 1/40 dUution. 

5. Load standard and samples into plate by columns with doubling dilution 
(lOOul per well). Standard is loaded in rows 1 and 12 in duplicate. 

20 6. Incubate for 2 hours in damp box in fridge. 

7. Wash plate with wash buffer (3X 200pl/well). 

8. Load lOOul/well peroxidase conjugate. Incubate in damp box for 2 hours 
in fridge. 

9. Wash plate with wash buffer (3X 200pl/well). Drain plate. 
25 10. Add 100pJ/well of substrate and leave for 5 minutes. 

Stop reaction by addition of 100nl/well of stop solution. 

1 1 . Read plate on plate reader with 650nm filter. 

12. Plot standard curve using mean of duplicates (O.D v.log concentration PC) 
and calculate regression line equation. Use equation to calculate sample 

30 values. Data handling can be performed by PC assay programme on Dynex 
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Revelation software. 
Example 2 

5 Expression of AAT from cDNA in constructs according to the present invention. 

Transgenic mice were prepared as in Example 1. Analysis of AAT in the milk of 
transgenic mice was according to standard procedures, for example as described in 
Wright, G., Carver, A., Cottom, D., Reeves, D., Scott, A., Simons, J.P., 
10 Wilmut, L, Garner, I., and Colman A., 1991. High level expression of active 
human al antitrypsin in the milk of transgenic sheep. Bio/Technology 9: 830-834. 

Results are shown in Table 3. 

is Example 3. 

Expression of antibody fragment from constructs according to the present 
invention. 

20 Constructs pMAD6 and pCASMAD6 were prepared incorporating DNA encoding 
an antibody binding fragment to give constructs pMAD6-AB and pCASMAD6- 
AB. The constructs were used to obtain transgenic mice according to Example 1. 
Expression of the antibody fragment was determined by standard protocols. 

25 No expression of the antibody fragment was found in the transgenic mice with 
pMAD6-AB. Levels ranging from 0 to 129 /tg/ml were found in pCASMAD6-AB 
mice. 

i 

The results are given in Table 4. 



WO 99/03981 



PCT/GB98/02130 



5 



10 



34 

Example 4 

Expression of IgG from constructs according to the present invention. 
Const-acts pMAD6 and pCASMAD6 were prepared incorporating DNA encoding 
IgG to give constructs pMAD6-IgG and pCASMAD6-IgG. The constructs were 
used to obtain transgenic mice according to Example 1. Expression of IgG in the 
mice milk was determined by standard ELISA protocol. 

Results are given in Table 5. 

Example g 

Expression of adhesion molecule (soluble) from constructs according to the present 
invention. 

Constructs pMAD6 and pCASMAD6 were prepared incorporating DNA encoding 
a soluble adhesion molecule (SAM) to give constructs pMAD6-SAM and 
pCASMAD6-SAM. Transgenic mice were prepared according to Example 1 . 

20 The expression level range Otg/ml) in pCASMAD6-SAM transgenic mice was up 
to 500. The maximum level detected in the pMAD6-SAM transgenic mice was 80. 

Example <> 

25 Expression of collagen cDNA from constructs according to the present invention. 

The CASMAD6 vector was used. Collagen cDNA (human truncated pro-collagen 
ot2(l) homotrimer) was inserted as the DNA for the protein of interest. This was 
coinjected with two transgenes expressing a and p subunits of prolyl 4- 
30 hydroxylase, an enzyme for the post-translational modification of procollagen. 



15 
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Transgenic animals were obtained as in Example 1. Determination of collagen 
expression in mouse milk was as described according to standard protocols and 
described in WO97/08311. 

5 

11 transgenic lines bearing the collagen cDNA construct and the two prolyl 4- 
hydroxlase transgenes were analysed. Three lines were found to express 
procollagen <x2(l) homotrimer protein. The amount of collagen present was 
estimated by measurement of hydroxyproline content and Western analysis in 

10 comparison with bovine collagen standard. Three independently derived mouse 
lines were found to express detectable amounts of collagen. The levels present in 
milk of these three lines was estimated as: 10/xg/ml, 30/*g/ml, and 120-240/ig/ml. 
Collagen protein was absent from the milk of non-transgenic mice. Milk from the 
highest expressing line was analysed further and the procollagen present was found 

is to have formed a correctly aligned triple helical molecule. 

These results demonstrate secretion of relatively high levels of recombinant 
procollagen in the milk of transgenic mice by expression of cDNA under the 
control of the p-lactoglobulin promoter. 

20 

From the foregoing it will be appreciated that although specific embodiments of the 
invention have been described herein for purposes of illustration, various 
modifications may be made without deviating from the spirit and scope of the 
invention. Accordingly, the invention is not limited. All documents and papers 
25 cited or mentioned herein are fully incorporated by reference. 
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TABLE I 

Data supporting the improved expression from constructs according to the 
invention 



5 



Logic 




CDNA 


Construct 


Addition of an 
intron does NOT 
result in high 
expression 


PCORP3 
* 


Wild type* 


pMAD6 




PCORP16 


Mutant* 


pMAD6 


Addition of ANY 
intron into the 
5'UTRis 
insufficient 


PCORP4 


wild type 


pMADl 




PCORP5 


wild type 


pMAD16 


Addition of the 
naturally occurring 
P-casein5'UTR 
intron results in 
high expression of 
ANY cDNA 


PCORI69 


mutant 


pCASMAD6 


Addition of ANY 
naturally occurring 
5 r UTR intron 
results in high 
expression of ANY 
cDNA 


PCOR170 


mutant 


pACTMAD6 



♦refers to different protein C cDNAs 
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TABLE 4 



5 pMAD6-AB 



Mouse No. 


EXPRESSION 
0ig/ml) 


#33 


0 


#34 


0 


#35 


0 


#57 


0 


pCASMAD6-AB 


Mouse No. 


EXPRESSION 
(ftg/val) 


#14 


11.8 + 0.75 


#55 


36.17 + 0.175 


#58 


21.76 + 1.46 


#60 


20.8 + 0.67 


#69 


0 


#72 


1.9 + 0.01 


#75 


2.5 + 0.01 


#81 


0 


#102 


0.44 + 0.03 


#110 


0 


#194 


54.31 + 7.1 


#195 


129.41 + 1.38 


#205 


8.38 + 2.41 


#216 


45.6 + 0.12 
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table; g 

Expression Levels of IgG in Transgenic 
Mouse Milk 



5 



TG exp. vector Generation 


IgG Expression level 




/-IgG 


-ELISA- (mg/1) 


CASMAD6 


GO 


<0.05 


CASMAD6 


GO 


80 - 102.4 


CASMAD6 


GO 


22.4 - 35.8 


CASMAD6 


GO 


<0.05 


CASMAD6 


GO 


<0.05 


MAD6 


GO 


0.5 - 0.8 


MAD6 


GO 


2.3 - 3.5 1 


MAD6 


GO 


<0.05 


MAD6 


GO 


<0.05 


MAD6 


GO 


<0.05 


MAD6 


GO 


<0.05 


MAD6 


GO 


<0.05 


MAD6 


GO 


<0.05 


CASMAD6 


Gl 


0.18-0.2 


CASMAD6 


Gl 


0.5 - 0.6 


CASMAD6 


Gl 


15-16 



WO 99/03981 PCT/GB98/02130 

41 

1 . A nucleic acid expression construct comprising: 
(a) a promoter; 

5 (b) an intron whose natural position is within the S'-untranslated region 

of a gene from which it is derived; 

(c) a coding sequence; and 

(d) a 3 '-flanking sequence 

wherein the intron (b) is not derived from the same gene as that from which either 
10 the promoter (a) or the protein-coding sequence (c) is derived. 



2. An expression construct as claimed in claim 1 wherein the promoter is a 
gene promoter which drives expression of the coding sequence (c) in mammalian 
cells, in particular, a milk protein promoter. 

15 

3. An expression construct as claimed in claim 1 or claim 2 wherein the intron 
(b) is the first intron from a gene where the intron is naturally located entirely 
within the 5* untranslated region of the gene. 

20 4. An expression construct as claimed in 1, 2, or 3, wherein the intron (b) is 
the first intron from a gene which is a member of a family of genes where the 
intron is naturally located entirely within the 5' untranslated region of the gene. 

5. An expression construct as claimed in any one of claims 1 to 4 wherein the 
25 intron is from the casein gene family. 

6. An expression construct as claimed in any one of claims 1 to 4 wherein the 
intron is from the actin gene family. 



30 
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7. An expression construct as claimed in any one of claims 1 to 6 wherein the 
3* flanking sequence is any sequence which supports the correct transcription 
termination, mRNA 3* end processing, mRNA stabilisation, mRNA transport from 
the nucleus to cytoplasm and mRNA translation. 

5 

8. An expression construct as claimed in any one of claims 1 to 7 wherein the 
3'flanking sequence is a poly-A site or a p-lactoglobulin gene 3'-sequence 
beginning 3' to the natural p-lactoglobulin stop codon and continuing to at least 
about 50 bases 3* of the poly-A site. 

10 

9. An expression construct as claimed in any one of claims 1 to 7 wherein the 
3'-flanking sequence is a (5-casein gene 3'-sequence beginning 3' to the natural p- 
casein stop codon and continuing to at least about 50 bases 3' of the poly-A site. 

is 10. A process for the preparation of a host organism the process comprising 
introducing an expression construct, as claimed in any one of claims 1 to 9, into a 
suitable organism. 

11. A process as claimed in claim 10 wherein the suitable organism is a 
20 prokaryote, a fungi, a plant, an animal, or a eukaryotic cell. 

12. A process as claimed in claim 11 wherein the animal is a non-human 
mammal. 

25 13. A host organism incorporating a DNA expression construct as claimed in 
any one of claims 1 to 9. 

14. A host organism as claimed in claim 13 which is a procaryote (eg. Exoli), 
a fungi, a plant, an animal or a eukaryotic cell. 

30 
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15. A host organism, as claimed in claim 14, wherein the animal is a non- 
human mammal. 

16. A process of preparing a protein, the process comprising allowing an 
5 expression host to express a DNA expression construct as claimed in any one of 

claims 1 to 9. 

17. A process as claimed in claim 16, further including a process of purifying 
the protein. 

10 

18. A process as claimed in claim 15 or claim 16 wherein the protein is protein 
C, fibrinogen, AAT or collagen. 

19. A process as claimed in claim 16 or claim 17 wherein the expression host is 
is a prokaryote, a fungi, a plant, an animal or a eukaryotic cell. 

20. A process, as claimed in claimed 19, wherein the animal is a non-human 
mammal. 

20 21. A protein prepared by a process as claimed in any one of claims 16 to 20. 

22. The use of a nucleic acid expression construct comprising a promoter, an 
intron whose natural position is within the 5' untranslated region of a gene from 
which it is derived, a coding sequence and a 3' flanking sequence to obtain a 

25 transgenic host. 

23. The use of nucleic acid construct comprising a promoter, an intron whose 
natural position is within the 5' untranslated region of a gene from which it is 
derived, a coding sequence and a 3* flanking sequence to increase the likelihood 

30 of expression of the coding sequence from a transgenic host which incorporates 
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the nucleic acid construct, 

24. A process for improving the number of transgenic hosts which express a 
transgene coding sequence, the process comprising introducing into the host a 
nucleic acid construct comprising a promoter, an intron whose natural position is 

s within the 5' untranslated region of a gene from which it is derived, a coding 
sequence and a 3' flanking sequence. 

25. The use as claimed in any one of claims 22 or 23 or a process as claimed 
in claim 24 wherein the construct is as set out in any one of claims 1 to 9. 

10 
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